Evolution of the NBA In the Past Decade
Introduction
Basketball is a sport that is continuously evolving. Since its invention, there have been many changes in both the rules and the style of play. Over the past decade, the league has undergone a rapid change due to both the increase in the use of analytics and changes in officiating.
NBA basketball in the 2010s will be remembered for the introduction of the “pace and space” era, where teams play at a faster pace compared to the previous two decades and the explosion in the number of three-point shots per game. Stylistically, many teams saw the success of Stephen Curry and the Golden State Warriors as well as the Houston Rockets and began to emphasize the importance of being able to space the floor to draw out defenders.
The rise of analytics, especially with Daryl Morey and the Rockets has led to more layups and three-point attempts while also creating a sharp decline in long-range twos and midrange shots. The midrange jumpshot, the staple for superstars such as Kobe Bryant and Michael Jordan, has been mostly abandoned except by the most efficient scorers such as Chris Paul and Kevin Durant. Combined with officiating favoring offense over defense, players are now scoring more points with better efficiency compared to 10 years ago.
This blog will use two methods to observe the shift in strategy over the course of the history of the NBA. The first is to analyze shot location data, where the trend in the different locations where players shoot the ball are tracked since the 2010-2011 season. The second is by comparing present-day seasons to historical seasons. Finding the top historical comparisons to players today is interesting because one can determine if there are any players today that are similar to historical greats. In addition one can examine how the league has changed by comparing “old-school” players such as Demar Derozan versus more modern players such as Nikola Jokic.
Shot Data and Mapping
To get the shot data, we found a R package online which gets shot map data from the NBA stats website and API called BallR (ballr).
Shot Data using ballr package and the NBA API
The BallR package uses the NBA Stats API to help visualize shots taken by a player for a season. To run BallR, we have to run the following code in the console (taken from the BallR documentation).
packages = c("shiny", "tidyverse", "hexbin")
install.packages(packages, repos = "https://cran.rstudio.com/")
library(shiny)
runGitHub("ballr", "toddwschneider")
This will run the shiny app locally on your computer and will load the following functions and methods that we will need to use to generate a heat map of shot density:
court_maps.csv, which is a dataframe that holds all the points of the different zones of a basketball court, like a mapplot_court.Randplot_theme.R, which are methods needed to draw the courtgenerate_heatmap_chart, which is a function used to generate the heat map of shot datafetch_shots_by_player_id_and_seasonwhich is a function used to get shot data from the NBA API using a player’s ID and the season they were playing in.
The last function is a function we used to collect all the shot map data for all players in all season. To do this, we had to figure out all the player ID’s for all the players from the seasons between 2010 and 2021. After that, we iterated through every combination of player ID and season (if they played in that season) and used the fetch shots function to get all the shot data for all players. Running this code, we noticed that there seemed to be a hard limit of around 600, so we had to hard reset the function everytime it reached the limit. Some example code below shows our method to extract all the shot map data.
for(i in 1:nrow(player_stats)){
## for a dataframe called player_stats which has a player_id and season columns
stats <- fetch_shots_by_player_id_and_season(player_id= player_stats$person_id[i]
, season = test$season[i]
, player_stats = "Regular Season")
stats <- stats$player %>%
mutate(season = test$season[i])
## df is our final data set we are outputting
df <- rbind(df, stats)
## uncomment if you want to see the code running to make sure it isn't frozen
#print(i)
## Timer to not get kicked by API too fast
if(i %% 15 == 0){
Sys.sleep(5)
#print("Done Sleeping")
}
}
write.csv(df, "Data/total_shot_data.csv")
Our final data set was a .csv file that was over half a gigabyte in memory and over 1.9 million observations, after running our code for a few hours.
Player Season Data
The data used was scrapped from Basketball Reference. For each player and each season since 1982, the per-game and advanced statistics were collected. The original dataset was over 17,000 player-seasons, each with 43 statistical variables. In order to cut down the number of players analyzed, we decided to only examine “good” seasons, which were seasons where the player had a VORP (value over replacement player) of 1. To put into context, typically there are usually around 100 players per year with over 1 VORP. In order to account for the fact that many players play on multiple teams in one season, we only examine per-game and advanced stats rather than total stats and take the average for the player over the course of the season. Thus, we are able to obtain a relatively complete statistical profile for almost every “good” player for each year of their career.
Shot Type Graph and Table
In this graph we can see the trends in the percentage of shots in different zones for the league as a whole over the past decade. It is immediately apparent the sharp decline in the number of mid-range shots over the course of the decade. The decline of the mid-range shot has been fueled by an increase in three point shots as well as shots in the paint (near but not right at the basket). It is also noteworthy that the percentage of shots in the restricted area has also declined over the past decade even though it is the most efficient shot. This is likely due to defenses focusing on limiting these efficient shots as much as possible.
| Season | In The Paint (Non-RA) | Mid-Range | Restricted Area | Three-pointers | Total_Shots |
|---|---|---|---|---|---|
| 2010-11 | 27501 | 55532 | 57131 | 38614 | 178778 |
| 2011-12 | 21924 | 44199 | 47847 | 32744 | 146714 |
| 2012-13 | 26357 | 51477 | 59884 | 44001 | 181719 |
| 2013-14 | 28081 | 50758 | 61032 | 48829 | 188700 |
| 2014-15 | 27474 | 49125 | 60092 | 49804 | 186495 |
| 2015-16 | 27728 | 46497 | 62582 | 53458 | 190265 |
| 2016-17 | 27263 | 42295 | 61379 | 59626 | 190563 |
| 2017-18 | 29753 | 36098 | 60531 | 63451 | 189833 |
| 2018-19 | 31933 | 30331 | 66512 | 71861 | 200637 |
| 2019-20 | 27499 | 22795 | 55777 | 65162 | 171233 |
| 2020-21 | 27836 | 20151 | 47289 | 61198 | 156474 |
In the table above, it is also apparent that the total number of shots has increased over the past ten years (2020 and 2021 seasons are cut short but per game shots are up). This is largely due to teams playing at a faster pace. While this graph captures trends at a high level, we would also like to view exactly where players today are shooting more shots compared to ten years ago.
Heat Map
Using the BallR package, we can generate heatmaps of the shots in different locations around the league. The heat map provides several advantages. First is that not all shots in the same zone are of the same quality. For example, three point shots in the corners are among the most efficient shots in basketball and are significantly more valuable than three point shots “above the break” where the line is curved. In addition, formerly popular areas to shoot in the midrange such as the elbows and baseline jumpshots have declined significantly.
In the documentation, here is how the generate_heatmap_chart function works.
generate_heatmap_chart = function(shots, base_court, court_theme = court_themes$dark) {
base_court +
stat_density_2d(
data = shots,
aes(x = loc_x, y = loc_y, fill = stat(density / max(density))),
geom = "raster", contour = FALSE, interpolate = TRUE, n = 200
) +
geom_path(
data = court_points,
aes(x = x, y = y, group = desc),
color = court_theme$lines
) +
scale_fill_viridis_c(
"Shot Frequency ",
limits = c(0, 1),
breaks = c(0, 1),
labels = c("low", "high "),
option = "inferno",
guide = guide_colorbar(barwidth = 10)
) +
theme(legend.text = element_text(size = rel(0.6)))
}
Using the other objects generated from the ballr package mentioned before, we can write a for loop to generate the heat maps for all of the desired seasons, and plot them below.
for(i in 0:10){
curr_season = paste0(2010 + i,"-",11+i)
output %>%
filter(season == curr_season, shot_zone_basic %in% shot_zone_basic_list) %>%
generate_heatmap_chart(
base_court = plot_court(court_themes$dark),
court_theme = court_themes$dark
) + labs(
title = "Heat Map of All Shots",
subtitle = paste(curr_season, "Season")
)
ggsave(paste0(curr_season, ".png"))
}